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Systems and methods for organising informauon relat- 
ing to a study of polymorphisms. A database model (102) is 
provided which interralates information about one or more of, 
e.g, subjects (112) from whom samples (114) are extracted, 
primers used in extracting the DNA from the subjects, about 
the samples themselves, about experiments done on samples, 
about particular oligonucleotide probe arrays used to per- 
form experiments, about analysis procedures performed on 
the samples, and about analysis results. The model is read- 
ily translatable into database languages such as SQL. The 
database model scales to permit storage of information about 
large numbers of subjects, samples, experiments, chips, etc. 
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SYSTEM FOR PROVIDING A POLYMORPHISM DATABASE 



CROSS-REFERENCE TO RELATED APPLICATIONS 

The present application claims priority from U.S. Prov. App. No. 
60/053,842 filed July 25, 1997, entitled COMPREHENSIVE BIO-INFORMATICS 
DATABASE, from U.S. Prov. App. No. 60/069,198 filed on December 11, 1997, 
entitled COMPREHENSIVE DATABASE FOR BIOINFORMATICS , and from U.S. 
5 Prov. App. No. 60/069,436, entitled GENE EXPRESSION AND EVALUATION 
SYSTEM, filed on December 11, 1997. The contents of all three provisional 
applications are herein incorporated by reference. 

The subject matter of the present application is related to the subject 
matter of the following three co-assigned applications filed on the same day as the 

10 present application. GENE EXPRESSION AND EVALUATION SYSTEM (Attorney 
Docket No. 018547-035010), METHOD AND APPARATUS FOR PROVIDING A 
BIOINFORMATICS DATABASE (Attorney Docket No. 018547-033810), METHOD 
AND SYSTEM FOR PROVIDING A PROBE ARRAY CHIP DESIGN DATABASE 
(Attorney Docket No. 018547-033830). The contents of these three applications are 

15 herein incorporated by reference. 

BACKGROUND OF THE INVENTION 
The present invention relates to the collection and storage of information 
pertaining to chips for processing biological samples and thereby identifying 
polymorphisms. 

20 The genomes of all organisms undergo spontaneous mutation in the course 

of their continuing evolution generating variant forms of progenitor sequences (Gusella, 
Ann. Rev. Biochem. 55, 831-854 (1986)). The variant form may confer an evolutionary 
advantage or disadvantage relative to a progenitor form or may be neutral. In some 
instances, a variant form confers a lethal disadvantage and is not transmitted to 

25 subsequent generations of the organism. In other instances, a variant form confers an 
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evolutionary advantage to the species and is eventually incorpoftfe(T into the DNA of 
many or most members of the species and effectively becomes the progenitor form. In 
many instances, both progenitor and variant form(s) survive and co-exist in a species 
population. The coexistence of multiple forms of a sequence gives rise to 
5 polymorphisms. 

Despite the increased amount of nucleotide sequence data being generated 
in recent years, only a minute proportion of the total repository of polymorphisms in 
humans and other organisms has so far been identified. The paucity of polymorphisms 
hitherto identified is due to the large amount of work required for their detection by 

10 conventional methods. For example, a conventional approach to identifying 

polymorphisms might be to sequence the same stretch of oligonucleotides in a population 
of individuals by dideoxy sequencing. In this type of approach, the amount of work 
increases in proportion to both the length of sequence and the number of individuals in a 
population and becomes impractical for large stretches of DNA or large numbers of 

15 persons. 

Devices and computer systems for forming and using arrays of materials 
on a substrate have been developed. These devices and systems have been used for 
identifying polymorphisms. For example, PCT application WO92/10588, incorporated 
herein by reference for all purposes, describes techniques for sequencing or sequence 

20 checking nucleic acids and other materials. Arrays for performing these operations may 
be formed in arrays according to the methods of, for example, the pioneering techniques 
disclosed in U.S. Patent No. 5,143,854 and U.S. Patent No. 5,571,639, both 
incorporated herein by reference for all purposes. 

According to one aspect of the techniques described therein, an array of 

25 nucleic acid probes is fabricated at known locations on a chip or substrate. A 

fluorescently labeled nucleic acid is then brought into contact with the chip and a scanner 
generates an image file indicating the locations where the labeled nucleic acids bound to 
the chip. Based upon the identities of the probes at these locations, it becomes possible 
to extract information such as the identity of polymorphic forms in of DNA or RNA. 

30 Such systems have been used to form, for example, arrays of DNA that may be used to 
study and detect mutations relevant to cystic fibrosis, the P53 gene (relevant to certain 
cancers), HIV, and other genetic characteristics. 
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It would be highly useful to apply such arrays to the study of 
polymorphisms on a large scale. For example, it would be useful to conduct large scale 
studies on the correlation between certain polymorphisms and individual characteristics 
such as susceptibility to diseases and effectiveness of drug treatments. To achieve these 
5 benefits, it is contemplated that the operations of chip design, construction, sample 

preparation, and analysis will occur on a very large scale. The quantity of information 
related to each of these steps to store and correlate is vast. For large scale 
polymorphism studies, it will be necessary to store this information in a way to facilitate 
later advantageous querying and retrieval. What is needed is a system and method 
10 suitable for storing and organizing large quantities of information used in conjunction 
with polymorphism studies. 

SUMMARY OF THE INVENTION 
The present invention provides systems and methods for organizing 
information relating to study of polymorphisms. A database model is provided which 

15 interrelates information about one or more of, e.g. subjects from whom samples are 
extracted, primers used in extracting the DNA from the subjects, about the samples 
themselves, about experiments done on samples, about particular oligonucleotide probe 
arrays used to perform experiments, about analysis procedures performed on the 
samples, and about analysis results . The model is readily translatable into database 

20 languages such as SQL. The database model scales to permit storage of information 
about large numbers of subjects, samples, experiments, chips, etc. 

Applications include linkage studies to determine resistance to drugs, 
susceptibility to diseases, and study of every characteristic of humans and other 
organisms that is related genetic variability. Another application of a database 

25 constructed according to this model is quality control of the various steps of performing a 
polymorphism study. By preserving information about every step of a polymorphism 
study, one can assess the reliability of the results or use the preserved information as 
feedback to improve procedures. 

A further understanding of the nature and advantages of the inventions 

30 herein may be realized by reference to the remaining portions of the specification and the 
attached drawings. 
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BRIEF DESCRIPTION OF THE DRAWINGS 
Fig. 1 illustrates an overall system and process for forming and analyzing 
arrays of biological materials such as DNA or RNA. 

Fig. 2A illustrates a computer system suitable for use in conjunction with 
5 the overall system of Fig. 1. 

Fig. 2B illustrates a computer network suitable for use in conjunction with 
the overall system of Fig. 1. 

Fig. 3 illustrates a key for interpreting a database model. 
Figs. 4A-4H illustrate a database model for maintaining information for 
10 the system and process of Fig. 1 according to one embodiment of the present invention. 

DESCRIPTION OF SPECIFIC EMBODIMENTS 
Investigation of Polymorphisms 

A. Preparation of Samples 

Polymorphisms are detected in a target nucleic acid from an individual 
15 being analyzed. For assay of genomic DNA, virtually any biological sample (other than 
pure red blood cells) is suitable. For example, convenient tissue samples include whole 
blood, semen, saliva, tears, urine, fecal material, sweat, buccal, skin and hair. For 
assay of cDNA or mRNA, the tissue sample must be obtained from an organ in which 
the target nucleic acid is expressed. For example, if the target nucleic acid is a 
20 cytochrome P450, the liver is a suitable source. 

Many of the methods described below require amplification of DNA from 
target samples. This can be accomplished by e.g., PCR. See generally PCR 
Technology: Principles and Applications for DNA Amplification (ed. H.A. Erlich, 
Freeman Press, NY, NY, 1992); PCR Protocols: A Guide to Methods and Applications 
25 (eds. Innis, et al., Academic Press, San Diego, CA, 1990); Mattila et al., Nucleic Acids 
Res. 19, 4967 (1991); Eckert et al., PCR Methods and Applications 1, 17 (1991); PCR 
(eds. McPherson et al., IRL Press, Oxford); and U.S. Patent 4,683,202 (each of which 
is incorporated by reference for all purposes). 

Other suitable amplification methods include the ligase chain reaction 
30 (LCR) (see Wu and Wallace, Genomics 4, 560 (1989), Landegren et al., Science 241, 
1077 (1988), transcription amplification (Kwoh et al., Proc. Natl. Acad. Sci. USA 86, 
1173 (1989)), and self-sustained sequence replication (Guatelli et al., Proc. Nat. Acad. 

4 
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Sci. USA, 87, 1874 (1990)) and nucleic acid based sequence amplification (NASBA). 

The latter two amplification methods involve isothermal reactions based on isothermal 

transcription, which produce both single stranded RNA (ssRNA) and double stranded 

DNA (dsDNA) as the amplification products in a ratio of about 30 or 100 to 1, 

5 respectively. 



B. Detection of Polymorphisms in Target DNA 

There are two distinct types of analysis depending whether a 
polymorphism in question has already been characterized. The first type of analysis is 
sometimes referred to as de novo characterization. This analysis compares target 

10 sequences in different individuals to identify points of variation, i.e., polymorphic sites. 
By analyzing groups of individuals representing the greatest ethnic diversity among 
humans and greatest breed and species variety in plants and animals, patterns 
characteristic of the most common alleles/haplotypes of the locus can be identified, and 
the frequencies of such populations in the population determined. Additional allelic 

15 frequencies can be determined for subpopulations characterized by criteria such as 

geography, race, or gender. The second type of analysis is determining which form(s) of 
a characterized polymorphism are present in individuals under test. There are a variety 
of suitable procedures, which are discussed in turn. 



1. Allele-Specific Probes 

20 The design and use of allele-specific probes for analyzing polymorphisms 

is described by e.g., Saiki et al., Nature 324, 163-166 (1986); Dattagupta, EP 235,726, 
Saiki, WO 89/11548. Allele-specific probes can be designed that hybridize to a segment 
of target DNA from one individual but do not hybridize to the corresponding segment 
from another individual due to the presence of different polymorphic forms in the 

25 respective segments from the two individuals. Hybridization conditions should be 
sufficiendy stringent that there is a significant difference in hybridization intensity 
between alleles, and preferably an essentially binary response, whereby a probe 
hybridizes to only one of the alleles. Some probes are designed to hybridize to a 
segment of target DNA such that the polymorphic site aligns with a central position 

30 (e.g., in a 15 mer at the 7 position; in a 16 mer, at either the 8 or 9 position) of the 
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probe. This design of probe achieves good discrimination in hybridization between 
different allelic forms. 

Allele-specific probes are often used in pairs, one member of a pair 
showing a perfect match to a reference form of a target sequence and the other member 
5 showing a perfect match to a variant form. Several pairs of probes can then be 

immobilized on the same support for simultaneous analysis of multiple polymorphisms 
within the same target sequence. 

2. Tiling Arrays 

The polymorphisms can also be identified by hybridization to nucleic acid 
10 arrays, some example of which are described by WO 95/11995 (incorporated by 

reference in its entirety for all purposes). WO 95/11995 also describes subarrays that 
are optimized for detection of a variant forms of a precharacterized polymorphism. Such 
a subarray contains probes designed to be complementary to a second reference 
sequence, which is an allelic variant of the first reference sequence. The second group 
15 of probes is designed by the same principles as described in the Examples except that the 
probes exhibit complementarity to the second reference sequence. The inclusion of a 
second group (or further groups) can be particular useful for analyzing short 
subsequences of the primary reference sequence in which multiple mutations are expected 
to occur within a short distance commensurate with the length of the probes (i.e., two or 
20 more mutations within 9 to 21 bases). 

3. Allele-Specific Primers 

An allele-specific primer hybridizes to a site on target DNA overlapping a 
polymorphism and only primes amplification of an allelic form to which the primer 
exhibits perfect complementarity. See Gibbs, Nucleic Acid Res. 17, 2427-2448 (1989). 

25 This primer is used in conjunction with a second primer which hybridizes at a distal site. 
Amplification proceeds from the two primers leading to a detectable product signifying 
the particular allelic form is present. A control is usually performed with a second pair 
of primers, one of which shows a single base mismatch at the polymorphic site and the 
other of which exhibits perfect complementarity to a distal site. The single-base 

30 mismatch prevents amplification and no detectable product is formed. The method works 
best when the mismatch is included in the 3' -most position of the oligonucleotide aligned 
with the polymorphism because this position is most destabilizing to elongation from the 
primer. See, e.g., WO 93/22456. 



SUBSTITUTE SHEET (RULE 26) 



WO 99/05324 PCT/US98/15458 

4. Direct-Sequencing 

The direct analysis of the sequence of polymorphisms of the present 
invention can be accomplished using either the dideoxy chain termination method or the 
Maxam Gilbert method (see Sambrook et al., Molecular Cloning, A Laboratory Manual 
5 (2nd Ed., CSHP, New York 1989); Zyskind et al., Recombinant DNA Laboratory 
Manual, (Acad. Press, 1988)). 

5. Denaturing Gradient Gel Electrophoresis 

Amplification products generated using the polymerase chain reaction can 
be analyzed by the use of denaturing gradient gel electrophoresis. Different alleles can 
10 be identified based on the different sequence-dependent melting properties and 

electrophoretic migration of DNA in solution. Erlich, ed., PCR Technology, Principles 
and Applications for DNA Amplification, (W.H. Freeman and Co, New York, 1992), 
Chapter 7. 

6. Single-Strand Conformation Polymorphism Analysis 

15 Alleles of target sequences can be differentiated using single-strand 

conformation polymorphism analysis, which identifies base differences by alteration in 
electrophoretic migration of single stranded PCR products, as described in Orita et al., 
Proc. Nat. Acad. ScL 86, 2766-2770 (1989). Amplified PCR products can be generated 
as described above, and heated or otherwise denatured, to form single stranded 

20 amplification products. Single-stranded nucleic acids may refold or form secondary 
structures which are partially dependent on the base sequence. The different 
electrophoretic mobilities of single-stranded amplification products can be related to base- 
sequence difference between alleles of target sequences. 

Biological Material Analysis System 

25 One embodiment of the present invention operates in the context of a 

system for analyzing biological or other materials using arrays that themselves include 
probes that may be made of biological materials such as RNA or DNA. The VLSIPS™ 
and GeneChip™ technologies provide methods of making and using very large arrays of 
polymers, such as nucleic acids, on chips. See U.S. Patent No. 5,143,854 and PCT 

30 Patent Publication Nos. WO 90/15070 and 92/10092, each of which is hereby 

incorporated by reference for all purposes. Nucleic acid probes on the chip are used to 
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detect complementary nucleic acid sequences in a sample nucleic acid of interest (the 
"target" nucleic acid). 

Fig. 1 illustrates an overall system 100 for forming and analyzing arrays 
of biological materials such as RNA or DNA. A part of system 100 is a polymorphism 
5 database 102. Polymorphism database 102 includes information about, e.g., biological 
sources, preparation of samples, design of arrays, raw data obtained from applying 
experiments to chips, analysis procedures applied, and analysis results, etc. 
Polymorphism database 102 facilitates large scale study of polymorphisms. 

A chip design system 104 is used to design arrays of polymers such as 

10 biological polymers such as RNA or DNA. Chip design system 104 may be, for 
example, an appropriately programmed Sun Workstation or personal computer or 
workstation, such as an IBM PC equivalent, including appropriate memory and a CPU. 
Chip design system 104 obtains inputs from a user regarding chip design objectives 
including polymorphisms of interest, and other inputs regarding the desired features of 

15 the array. Optionally, chip design system 104 from external databases such as GenBank. 
The output of chip design system 104 is a set of chip design computer files in the form 
of, for example, a switch matrix, as described in PCT application WO 92/10092, and 
other associated computer files. The chip design computer files form a part of 
polymorphism database 102. Systems for designing chips for study of polymorphisms 

20 are disclosed in U.S. Patent No. 5,571,639 and in PCT application WO 95/11995, the 
contents of which are herein incorporated by reference. 

The chip design files are input to a mask design system (not shown) that 
designs the lithographic masks used in the fabrication of arrays of molecules such as 
DNA. The mask design system designs the lithographic masks used in the fabrication of 

25 probe arrays. The mask design system generates mask design files that are then used by 
a mask construction system (not shown) to construct masks or other synthesis patterns 
such as chrome-on-glass masks for use in the fabrication of polymer arrays. 

The masks are used in a synthesis system (not shown). The synthesis 
system includes the necessary hardware and software used to fabricate arrays of polymers 

30 on a substrate or chip. The synthesis system includes a light source and a chemical flow 
cell on which the substrate or chip is placed. A mask is placed between the light source 
and the substrate/chip, and the two are translated relative to each other at appropriate 
times for deprotection of selected regions of the chip. Selected chemical reagents are 
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directed through the flow cell for coupling to deprotected regions, as well as for washing 
and other operations. The substrates fabricated by the synthesis system are optionally 
diced into smaller chips. The output of the synthesis system is a chip ready for 
application of a target sample. 
5 Information about the mask design, mask construction, and probe array 

synthesis is presented by way of background. A biological source 112 is, for example, 
tissue from a plant or animal. Various processing steps are applied to material from 
biological source 112 by a sample preparation system 114. Operation of sample 
preparation system 114 in the context of a polymorphism study is discussed below in 
10 further detail. 

The prepared samples include nucleic acid sequences such as DNA. 
When the sample is applied to the chip by a sample exposure system 116, the nucleic 
acids may or may not bond to the probes. The nucleic acids can be tagged with 
fluoroscein labels to determine which probes have bonded to nucleotide sequences from 

15 the sample. The prepared samples will be placed in a scanning system 118. Scanning 
system 118 includes a detection device such as a confocal microscope or CCD (charge- 
coupled device) that is used to detect the location where labeled receptors have bound to 
the substrate. The output of scanning system 118 is an image file(s) indicating, in the 
case of fluorescein labeled receptor, the fluorescence intensity (photon counts or other 

20 related measurements, such as voltage) as a function of position on the substrate. These 
image files may also form a part of polymorphism database 102. Since higher photon 
counts will be observed where the labeled nucleic acid(s) has bound more strongly to the 
array of probes, and since the monomer sequence of the probes on the substrate is known 
as a function of position, it becomes possible to analize the sequence(s) of the nucleic 

25 acid(s) that are complementary to the probes. 

The image files and the design of the chips are input to an analysis system 
120 that, e.g., calls bases. Such analysis techniques are described in EPO Pub. No. 
07171 13A, the contents of which are herein incorporated by reference. 

Chip design system 104, analysis system 120 and control portions of 

30 exposure system 116, sample preparation system 114, and scanning system 118 may be 
appropriately programmed computers such as a Sun workstation or IBM-compatible PC. 
An independent computer for each system may perform the computer-implemented 
functions of these systems or one computer may combine the computerized functions of 
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two or more systems. One or more computers may maintain chip design database 102 
independent of the computers operating the systems of Fig. 1 or chip design database 102 
may be fully or partially maintained by these computers. 

Fig. 2A depicts a block diagram of a host computer system 10 suitable for 
5 implementing the present invention. Host computer system 210 includes a bus 212 

which interconnects major subsystems such as a central processor 214, a system memory 
216 (typically RAM), an input/output (I/O) adapter 218, an external device such as a 
display screen 224 via a display adapter 226, a keyboard 232 and a mouse 234 via an 
I/O adapter 218, a SCSI host adapter 236, and a floppy disk drive 238 operative to 

10 receive a floppy disk 240. SCSI host adapter 236 may act as a storage interface to a 
fixed disk drive 242 or a CD-ROM player 244 operative to receive a CD-ROM 246. 
Fixed disk 244 may be a part of host computer system 210 or may be separate and 
accessed through other interface systems. A network interface 248 may provide a direct 
connection to a remote server via a telephone link or to the Internet. Network interface 

15 248 may also connect to a local area network (LAN) or other network interconnecting 
many computer systems. Many other devices or subsystems (not shown) may be 
connected in a similar manner. 

Also, it is not necessary for all of the devices shown in Fig. 2A to be 
present to practice the present invention, as discussed below. The devices and 

20 subsystems may be interconnected in different ways from that shown in Fig. 2A. The 

operation of a computer system such as that shown in Fig. 2A is readily known in the art 
and is not discussed in detail in this application. Code to implement the present 
invention, may be operably disposed or stored in computer-readable storage media such 
as system memory 216, fixed disk 242, CD-ROM 246, or floppy disk 240. 

25 Fig. 2B depicts a network 260 interconnecting multiple computer systems 

210. Network 260 may be a local area network (LAN), wide area network (WAN), etc. 
Bioinformatics database 102 and the computer-related operations of the other elements of 
Fig. 2B may be divided amongst computer systems 210 in any way with network 260 
being used to communicate information among the various computers. Portable storage 

30 media such as floppy disks may be used to carry information between computers instead 
of network 260. 
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Overall Description of Database 

Polymorphism database 102 is preferably a relational database with a 

complex internal structure. The structure and contents of chip design database 102 will 

be described with reference to a logical model depicted in Figs. 4A-4H that describes the 

5 contents of tables of the database as well as interrelationships among the tables. A visual 

depiction of this model will be an Entity Relationship Diagram (ERD) which includes 

entities, relationships, and attributes. A detailed discussion of ERDs is found in "ERwin 

version 3.0 Methods Guide" available from Logic Works, Inc. of Princeton, NJ, the 

contents of which are herein incorporated by reference. Those of skill in the art will 

10 appreciate that automated tools such as Developer 2000 available from Oracle will 

convert the ERD from Figs. 4A-4H directly into executable code such as SQL code for 
creating and operating the database. 

Fig. 3 is a key to the ERD that will be used to describe the contents of 
chip design database 102. A representative table 302 includes one or more key attributes 

15 304 and one or more non-key attributes 306. Representative table 302 includes one or 
more records where each record includes fields corresponding to the listed attributes. 
The contents of the key fields taken together identify an individual record. In the ERD, 
each table is represented by a rectangle divided by a horizontal line. The fields or 
attributes above the line are key while the fields or attributes below the line are non-key. 

20 An identifying relationship 308 signifies that the key attribute of a parent table 310 is 

also a key attribute of a child table 312. A non-identifying relationship 314 signifies that 
the key attribute of a parent table 316 is also a non-key attribute of a child table 318. 
Where (FK) appears in parenthesis, it indicates that an attribute of one table is a key 
attribute of another table. Both the depicted non-identifying and identifying relationship 

25 are one to one-or-more relationships where one record in the parent table corresponds to 
one or more records in the child table. An alternative non-identifying relationship 324 is 
a one to zero-or-more relationship where one record in a parent table 320 corresponds to 
zero or more records in a child table 322. 

Database Model 

30 Figs. 4A-4H are entity relationship diagrams (ERDs) showing elements of 

polymorphism database 102 according to one embodiment of the present invention. Each 

11 
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rectangle in the diagram corresponds to a table in database 102. First, the relationships 
and general contents of the various tables will be described. 

The interrelationships and general contents of the tables of database 102 
will be described first. Then a chart will be presented listing and describing all of the 
fields of the various tables. 

Fig. 4A illustrates core elements of database 102 according to one 
embodiment of the present invention. A subject table 402 lists organisms from which 
samples have been extracted for polymorphism analysis or other tissue sources. Samples 
may also be obtained from tissue collections not associated with any one identified 
organism. Information stored within subject table 402 includes the name, gender, 
family, position with family, (e.g., father , mother, etc.), and ethnic group. For human 
subjects, the name and family will preferably be represented in coded form to assure 
privacy. Associated with each subject is a species as listed in a species table 404. Also, 
a relationship may be defined among subjects a subject relationship table 406 which 
includes records corresponding to related subjects. These relationships may be father- 
mother, sibling, twins, etc. Subjects may be part of a group that is being studied, e.g., 
a group with a congenital disease, or a toxic reaction to a particular drug. The groups 
are listed in a subject group table 408. Participation of subjects in groups is defined by a 
subject participation table 410 which lists all group memberships. 

Samples and their attributes are listed in a sample table 412. Each sample 
has an associated sample type. The sample types are listed in a sample type table 414. 
Possible sample types include blood, urine, etc. Companies or institutions that provide 
samples are listed in a sample source table 416. 

Database 102 provides an item table 418 that includes records for items. 
There are various types of items that correspond to different stages of the sample 
preparation process. An "item derivation" transforms an item of one type into an item of 
another type. The following table lists various item types and item derivation types for 
a representative embodiment. 

Item Type Derived from by Item Derivation Type 

Sample other samples pooling 

Sample other sample splitting 

12 
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Target 
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Sample 



Extracted DNA 



Target 

Labeled Target 



Stained Hybridized Chip Hybridized Chip 
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DNA Extraction 

PCR 



Labeling 

Hybridization (application of 

target to chip) 

Staining 



Item derivations are listed in an item derivation table 420. It should be noted that 
10 derivations need not produce a change between item types. Each item derivation occurs 

in accordance with a protocol that characterizes the step or steps in the derivation. 

Protocols are listed in a protocol table 428. Each item derivation is performed by an 

employee listed in employee table 432. 

Unused chips are listed in a chip table 422. Hybridized chips (i.e., chips 
15 that have had target applied) are listed in a hybridized chip table 424. A hybridized 

sample map table 426 lists the relationships between hybridized chips and the samples 

that have been applied to them. 

Stained hybridized chips are scanned in a process referred to here as a 

scan experiment. Scan experiments are listed in a scan experiment table 430. The scan 
20 experiment occurs in accordance with a protocol listed in protocol table 428. The scan 

experiment is performed by an employee listed in employee table 432. 

Fig. 4B depicts further details of the data model for items and item 

derivations. The various item types are listed in an item type table 434 and the various 

item derivation types are listed in an item derivation type table 436. The relationships 
25 between successive item types, e.g., sample and target are defined in an item type 

derivation table 438. An item has associated attributes. For example, for a target, 

database 102 may store the concentration, volume, location and/or remaining amount. 

All item attributes are stored in an item attribute table 440. Item attributes may be 

shared among multiple items. For example, a series of targets may all share a 
30 preparation date. An item attribute item map table 442 implements a many-to-many 
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relationship between item attributes and items. The various types of item attributes such 
as preparer, preparation date, etc. are listed in an item attribute type table 444. Each 
item type has corresponding attribute types. Some attribute types are, however, shared 
among various item types. Accordingly, there is a many-to-many relationship among 
5 item attribute types and item types that is implemented by an item type map table 446. 

The tables of Fig. 4B represent a powerfully general model of the sample 
preparation process. Changes in process steps that require changes in the type of 
information that should be stored may be implemented by changing and adding table 
contents rather than providing new tables or changing relationships among tables. 

10 Fig. 4C depicts a detailed data model for storing information about 

protocols according to the present invention. Protocols as stored in protocol table 428 
represent information about particular processes that have been performed including item 
derivations, analyses, and scan experiments. Each protocol has an associated protocol 
template. Protocol templates identify protocol types. For example, one protocol 

15 template may be a PCR template. All protocols associated with the PCR template 

identify parameters for performing a PCR procedure. Protocol templates are listed in a 
protocol template table 448. A parameter table 450 lists all the parameters and their 
values for all the protocols listed in protocol table 428. A parameter template table 452 
lists the various parameter types along with default values. An examples of a parameter 

20 template would be a PCR reaction temperature. The parameter template would include a 
default value for this parameter. Parameter table 450 might then list many different PCR 
reaction temperature values that would be used by many different protocols. If a 
parameter value has not been modified by the user, it inherits the standard value of the 
associated parameter template. A parameter template set is a set of parameter templates 

25 that are used for a particular purpose, e.g., in association with protocols according to 
one or more protocol templates. Parameter template sets are listed in a parameter 
template set table 454. There are different types of parameter template set and these are 
listed in a parameter template set table 456. A mapping between parameter template 
sets and protocol templates is defined by a protocol template set map table 458. 

30 Protocol templates may have associated lengthy verbal information about 

how to perform protocol steps. A protocol template document table 460 stores 
references to documents that include instructions for performing protocols. 
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As with the items, the data model for protocols defined by Fig. 4C is 
highly general and allows significant changes in the way item derivations, analyses, and 
experiments are performed without changing the underlying data model. 

Referring again to Fig. 4 A, there are tables to record information 
5 concerning the use of primers in PCR. A fragment table 462 lists all the sequence 

fragments investigated in conjunction with database 102. Associated with each fragment 
are one or more primer pairs used to amplify the fragment in a PCR process. A primer 
pair table 464 lists all the primer pairs including information about whether the primer 
pair actually worked to amplify the fragment. In order to develop the information about 
10 the effectiveness of primer pairs, there is a PCR table 466 that lists records identifying 
the outcome of multiple PCR operations. The individual PCR operations are identified 
by reference to item derivation table 420. 

A single PCR operation may be used to amplify many different fragments 
and thus employ many different primer pairs. Of course, a single primer pair may be 

15 used in multiple PCR operations. There is therefore a many-to-many relationship 

between PCR operations and primer pairs that is recorded by a primer pair PCR map 
table 468. Information about individual primers is stored in a primer table 470. Also, 
each primer has an associated protocol in protocol table 428 that characterizes the primer 
preparation process. Information about primer orders is listed in a primer order table 

20 472. Each primer order is to a vendor and the vendors are listed in a vendor table 474. 
Each primer order is made by an employee listed in employee table 432. A primer order 
design map table 476 implements a many-to-many relationship between primer orders 
and primers. 

The data model described here thus preserves information about primers 
25 used in PCR reactions. One can improve results by using primers that have successfully 
amplified a given fragment in the past. Sometimes particular groups of primer pairs 
cannot be multiplexed together in the same PCR process. The information preseryed 
here thus permits experimenters to make optimal use of expensive and time consuming 
PCR procedures. 

30 It is also useftil to preserve information about the chip production process 

and the origin of individual chips. A wafer table 478 lists wafers. When chips are 
produced, many chips are produced at the same time as part of a single wafer. Chip 
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table 422 stores references to wafer table 478 for each chip and the location of each chip 
on its wafer at production time. Sometimes there is analytic significance associated with 
the location of a chip on the wafer. Each wafer is produced as part of a lot and the 
identify of the lot for each wafer is recorded by wafer table 478 as a reference to a lot 
table 480 that lists each lot. 

Fig. 4D depicts further details of tables pertaining to chip design that are 
preferably maintained within polymorphism database 102 according to one embodiment 
of the present invention. A tiling design table 482 lists tiling designs. Each tiling design 
represents the application of a particular tiling format to a sequence to be investigated. 
Tiling formats indicate probe orientation, probe length, and the position within a probe 
of a single nucleotide polymorphism being investigated. In a preferred embodiment, 
there may be very few tiling formats and they are listed in a tiling format table 484. 

A particular tiling design includes many atom designs specifying the design 
of a single atom. In one embodiment, an atom is a group of typically four probes used 
to investigate a single base position with each probe hybridizing to a sequence including 
a different base at that position. Atom designs are listed in an atom design table 486. 
Records identifying the designs of individual probes are listed in a probe design table 
488. A probe design role table 490 indicates the roles of probes listed in probe design 
table 488 in the atom designs of atom design table 486. For combinations of probe 
design and atom design, probe design role table 490 indicates which base the probe 
hybridizes to at the substitution position and whether the probe represents a match or a 
mismatch to the wild type. 

A probe data table 492 gives the hybridization intensity values for 
particular probes designs as determined in particular scan experiments. Each record of 
the table also gives the number of pixels used to determine the intensity value and the 
standard deviation of intensity as measured among the pixels. 

Figs. 4E-4G depict aspects of polymorphism database 102 related to 
analysis procedures and their results according to one embodiment of the present 
invention. An analysis table 494 lists analyses performed. An analysis generally refers 
to a non-trivial transformation of data. Records of analysis table 494 include references 
to protocol table 428 to specify parameters used for each analysis. Analyses may take as 
their input raw data or the results of previous analyses. An analysis dependency table 
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496 lists dependencies among analyses where one analysis depends on the data developed 
by another analysis. An analysis input table 498 lists inputs for analyses listed in 
analysis table 494. 

On the right side of Fig. 4E are various tables used to support analyses. 
A chip design sequence map table 500 correlates particular fragments with chip designs. 
A sequence position table 502 lists investigated sequence positions indicating their 
positions on a fragment. Records of sequence position table 502 reference a genomic 
sequence position table 504 which gives sequence positions in the genome rather than 
within individual fragments. 

A scan experiment set table 506 lists sets of scan experiments. This 
allows for groupings of experiments for individuals or populations to serve as the basis 
for polymorphism analysis. A scan experiment used table 508 lists records indicating 
memberships of a scan experiment in a scan experiment set. 

A tiling data table 510 lists records identifying tiling designs as 
implemented in particular chips measured by particular scan experiments. An atom data 
table 512 lists the intensities measured for particular sequence positions as measured in 
scan experiments identified by the tiling data records. A subject sequence position data 
table 514 lists combinations of sequence position and scan experiment. 

A series of tables in Figs. 4E-4G correspond to different types of analysis 
that occur during the course of a polymorphism investigation. The types presented here 
are merely representative. A parallel series of tables provide the analysis results. A 
polymorphism analysis table 516 lists references to analysis table 494. The results of the 
performed polymorphism analyses are listed in a polymorphism position result table 518. 
A record of this table gives a result for a polymorphism analysis for a particular position 
as determined based on a particular set of scan experiments. In one embodiment the 
result is whether a particular mutation is certain, likely, possible, or not possible at the 
position. The result may also be that the reference is wrong. 

A user polymorphism analysis table 520 lists user interpretations of results 
as listed in polymorphism position result table 518. The records of user polymorphism 
analysis table 520 are references to analysis table 494. The user interpretations 
themselves are stored in a user polymorphism analysis result table 522. Each result is a 
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likelihood of a particular mutation at a position as considered by a user plus an 
accompanying user comment. 

A P-Hat analysis estimates the relative concentrations of wild type 
sequence and sequence having a particular mutation as determined in a particular scan 
experiment. A P-Hat analysis table 524 lists references to analysis table 494. An atom 
result table 526 gives estimates of the relative concentration along with upper and lower 
bounds and a maximum intensity. For heterozygous mutations, the estimates of relative 
concentration will cluster around 0.5 For homozygous mutations, the estimates should 
cluster around 1.0. 

Base call analyses are determinations of the base at a particular position 
for a particular individual that may be based on more than one experiments. A base call 
analysis table 528 lists references to analysis table 494. A base call result table 530 lists 
the called bases for particular combinations of sequence position and subject. 

A P-Hat grouping analysis determines a measure of likelihood that data in 
a set of scan experiments results from separate genotypes. P-hat grouping analyses are 
listed in a p-hat grouping analysis table 532 by reference to analysis table 494. P-hat 
grouping analysis results are listed in a mutation fraction result table 534. A group 
separation is given for various combinations of sequence position and scan experiment 
set. 

A clustering analysis determines an alternative measure of likelihood that 
data in a set of scan experiments results from separate genotypes. Clustering analyses 
are listed in a clustering analysis table 536 by reference to analysis table 494. Clustering 
analysis results are listed in a clustering result table 538. A clustering factor is given for 
various combinations of sequence position and scan experiment set. 

Fig. 4F shows tables which support normalization and footprint finding 
operations that support the analyses referred to in Fig. 4E. Hybridization intensity 
measurements made in scan experiments should be normalized over a set of scan 
experiments. The normalization should take into account differences in amplification 
level produced by different PCR processes. 

Normalization is done by region of sequence. A normalization region 
analysis determines the boundaries of a region to be normalized. The determination of 
boundaries takes into account that different fragments of sequence are amplified by 
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different PCR procedures. A normalization region analysis table 540 lists normalization 
region analyses by reference to analysis table 494. A normalization region result table 
542 lists the boundaries for each determined normalization region. 

Normalization values for identified normalization regions are themselves 
determined by normalization analyses. Normalization analyses are listed in a 
normalization analysis table 544 by reference to analysis table 494. A normalization 
result table 546 lists the normalization values for regions. 

A footprint analysis determines regions of sequence for which the 
hybridization intensity is elevated for the purposes of quality control. Footprint analyses 
are listed in a footprint analysis table 548 by reference to analysis table 494. Footprints 
are identified by sequence starting point and ending point in a particular scan experiment 
in a footprint table 550. 

Fig. 4G depicts tables pertaining to measurement quality according to one 
embodiment of the present invention. A tiling data quality analysis determines the 
quality of results from a scan experiment. These analyses are listed in a tiling data 
quality analysis table 552 by reference to analysis table 494. Tiling data quality analysis 
results are listed in a tiling data quality result table 554. The results include an average 
hybridization intensity value for perfect match or mismatch probes. A wild type call rate 
gives the fraction of atom data where the probe corresponding to the reference base has 
the highest hybridization intensity. A wild type call rate of around 1.0 indicates good 
quality. Where the call rate is less than 0.75, the scan experiment should be rejected. 
An accept data field indicates whether the analysis indicates rejection or acceptance. 

Where scan experiment measurements indicate two or more non-wild type 
bases within a probe length, this indicates a measurement problem for the affected region 
of sequence. These regions are identified by difficult region analyses listed in a difficult 
region analysis table 556 by reference to analysis table 494. A difficult region result 
table 558 lists the regions identified as being difficult. 

Analysis dependency table 496 indicates interrelationships among the 
various analyses of Figs. 4E-4G. A footprint analysis may depend on a normalization 
analysis which may in turn depend on a normalization region analysis. A basecall 
analysis or PHatGrouping analysis may depend on an atom analysis. A polymorphism 
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analysis may depend on any of these analyses and/or a user polymorphism analysis 
and/or a clustering analysis. 

Another aspect of the investigation of polymorphisms is seeking patent 
protection for identified polymorphisms. Fig. 4H shows tables of polymorphism 
5 database 102 related to efforts to seek patent protection according to one embodiment of 
the present invention. A polymorphism patent sequence table 560 lists sequences for 
which patent protection is sought. A patent application table 562 lists patent applications 
directed toward the protection of polymorphisms. A polymer patent application sequence 
map table 564 implements a many-to-many relationship between patent applications and 

10 sequences. A prior application table 566 lists relationships between patent applications 
and prior related patent applications. An attorney table 568 lists attorneys responsible 
for preparing patent applications listed in patent application table 562. A law firm table 
570 lists the law firms to which the attorneys listed in attorney table 568 belong. 

An employee group table 572 lists groups of inventors for the patent 

15 applications listed in table 562. Individual inventors are listed in employee table 432. 
An employee group map table 574 implements a many-to-many relationship between 
inventors and groups of inventors. 

The data model of Fig. 4H greatly facilitates the process of securing patent 
protection for polymorphisms and thereby increases the commercial incentive for 
20 investigation of polymorphisms. 

Database Contents 

The contents of the tables introduced above will now be presented in 
greater detail in the following chart. 



TABLE 


FIELD 


COMMENT 


tblSubject 
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TABLE 


FIELD 


COMMENT 




SubjectId:INTEGER 


Identifies 
biological 
source of 
sample. 




SpeciesID : INTEGER 


Species of 
subject. 




Name : V ARCHAR2(20) 


Name of 
subject 
(anonimized 
for human 
subjects). 




Gender: VARCH AR2( 10) 


Gender of 
subject. 




Family : VARCHAR2(20) 


Family of 
subject 
(anonimized 
for human j 
subjects). j 




Member: SMALLINT 


Position in 
family (father, 
mother, etc.). 




Group_:VARCHAR2(20) 


Ethnic group. 




CellLineID:VARCHAR2(20) 


Identifier for 
sample source 
not associated 
with particular 
organism. 
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TABLE 


FIELD 


COMMENT 




IsReference*SMAT I TNT 


Whether or 
not subject is 
in a group. 


tblSpecies 








SpeciesId:INTEGER 


Species 
identifier. 




Name: VARCHAR2(30) 


Name of 
species. 


SubjectRelationship 








Subjectl :INTEGER 


First subject in 
relationship. 




Subject2:INTEGER 


Second subject 
in relationship. 




Position:VARCHAR2(2) 


Nature of 
relationship. 


tblSubjectGroup 








GroupId:INTEGER 


Identifier of 
group of 
subiects (not 
same as ethnic 
group). 




GroupCode:VARCHAR2(20) 


Code identifier 
for group. 




Comments:LONG VARCHAR 


User 

comments on 
group. 
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TABLE 


FIELD 


COMMENT 




upsize_ts:DATE 


Creation date 
for group. 


tblSubjectParticipation 








SubjectId:INTEGER 


Reference to 
subject table. 




Groupld: INTEGER 


Reference to 
subject group 
table. 


tblSample 








Sampleld : INTEGER 


Sample 
identifier. 




SubjectID:INTEGER 


Reference to 
subject table. 




SampleSourceId:CHAR(18) 


Institutional 
source of 
sample. 




Code:VARCHAR2(20) 


Code 

representing 

individual 

subject. 




Recipient: VARCHAR2(20) 


Person 

accepting 

sample. 




Provider:VARCHAR2(20) 


Person or 
institution 
providing 
sample. 
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TABLE 


FIELD 


COMMENT 




DateReceived:DATE 


Date sample 
received. 




ProtocoUd:INTEGER 


Reference to 
protocol table. 




SampleTypeldrlNTEGER 


Reference to 
sample type 
table. 


tblSampleType 








SampleTypeld: INTEGER 


Sample type 
identifier. 




Description:VARCHAR2(50) 


Description of 
sample type. 


tblSample Source 








SampleSourceId:CHAR(18) 


Identifier of 
institutional 
sample source. 




ProviderName:VARCHAR2(20) 


Name of 
individual or 
institutional 
sample 

nrovider 


Item 








ItemId:INTEGER 


Item identifier. 




ItemTypeId:INTEGER 


Item type 
identifier. 




ItemName:VARCHAR2(50) 


Name of item. 



24 



SUBSTITUTE SHEET (RULE 26) 



WO 99/05324 



PCIYUS98/1S458 



TABLE 


FIELD 


COMMENT 


ItemDerivation 








Item 1 Id : INTEGER 


Derivation 
source. 




Item2Id:INTEGER 


Derivation 
result. 




Employeeld : INTEGER 


Employee 
responsible for 
derivation. 




DerivationTypeId:INTEGER 


Derivation 
type identifier. 




Protocolid:VARCHAR2(18) 


Reference to 
protocol table. 




Date: DATE 


Date of 
derivation. 


tblChip 








Chipld : INTEGER 


Rename 
reference to 
item table. 




ChipDesignPlacementId:INTEGER 


Placement on 
wafer. 




Locationld : INTEGER 


Location of 
chip. 




WaferId:INTEGER 


Wafer the chip 
was on. 


tblHybedChip j 
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TABLE 


FIELD 


COMMENT 




HybedChinldlNTEGER 


Rename 
reference to 
item table. 




SubjectID :INTEGER 


Reference to 
subject table. 




ProtocolId:INTEGER 


Reference to 
protocol table. 




Repetition:SMALLINT 


Refers to 
number of 
times chin has 
been washed 
and reused. 


tblHybSampleMap 








ItemId:INTEGER 


Reference to 
item table. 


Protocol 








ProtocolId:INTEGER 


Protocol 
identifier. 




ProtocolTemolateld'INTEGER 


Protocol 
template 
identifier. 




Name:VARCHAR2(100) 


Name of 
protocol. 


tblScanExperiment 
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TABLE 


FIELD 


COMMENT 




ScanExptld : INTEGER 


Scan 

experiment 
identifier. 




ItemldrlNTEGER 


Reference to 
item table. 




ScanCode:VARCHAR2(25) 


File for scan 
results. 




ProtocolIdrlNTEGERP 


Reference to 
protocol table. 




ScanRatingldilNTEGER 


Assessment of 
scan quality. 




ExperimenterId:INTEGER 


Experimenter 
identifier. 




Date: DATE 


Date of 
experiment. 




ConversionTool:VARCHAR2(50) 


Program used 
to convert 
from scan 
image to 
intensities. 




ConversionDate:DATE 


Date of 
conversion. 




ScanStatus:VARCHAR2(50) 


whether or not 
scan image has 
been converted 
to intensities 




Comments:LONG VARCHAR 


Comments. 
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TABLE 


FIELD 


COMMENT 








Employee 








EmpIoyeeId:INTEGER 


Employee 
identifier. 




EmployeeCode:VARCHAR2(5) 


Code for 
employee 




FName:VARCHAR2(20) 


First name of 
employee. 




MName : V ARCH AR2(20) 


Middle name 
of employee. 




LName: VARCHAR2(20) 


Last name of 
employee. 


ItemType 








Itemld: INTEGER 


Item type 
identifier. 




ItemTypeName:VARCHAR2(30) 


Name of item 
type. ! 




FormName:VARCHAR2(100) 


Reference to 
user interface 
form for item 
type. 


ItemDerivationType 








DerivationTypeId:INTEGER 


Derivation 
type identifier. 
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TABLE 


FIELD 


COMMENT 




Deri vationTvoe * V ARCH AR2C50) 


Description of 

derivation 

type. 


ItemTypeDerivation 








NextltemTypeldilNTEGER 


Result type of 
derivation. 




ItemTy peld : INTEGER 


Source type of 
derivation. 


ItemAttribute 








itemAttributeldrlNTEGER 


Item attribute 
identifier. 




Ttem AtrrihiitpTvnpM -TNTFfrFR 


Reference to 
item attribute 
type table. 




Attribute:VARCHAR2(50) 


Attribute 
value. 


ItemAttributeltemMap 








Item AttrihiiteTri • TNCrFGFR 


Reference to 
item attribute 
table. 




ItemId:INTEGER 


Reference to 
item table. 


ItemAttributeType 








ItemAttributetypeId:INTEGER 


Item attribute 
identifier. 
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TABLE 


FIELD 


COMMENT 




ItemAttributeName: VARCH AR2(30) 


Name of item 
attribute type. 


ItemTypeMap 








Ttem A ttrihiirpTvneTri ■ TNTFOFR 

lLC1117i.lll 1UULC 1 ^ptlU . 11 1 1 J_yVJJ_/l\. 


Reference to 
item attribute 
type table. 




TtemTvnpTd 'TNTFfrFR 


Reference to 

IVvlvl wllvV lv 

item type 
table. 


ProtocolTemplate 








Prntnrnl T^mnl a ipJA * T WTTTTiFR 
X lUlUvUl 1 CIIipidLClU . UN 1 IaJEA 


template 
identifier. 




Name- VARPW AR9H ftffi 


Mame of 

llOlHW V/l 

protocol 
template. 




Dafr^rpated • D ATP 


Date nrotocol 

template 
created. 




FormName: VARCHAR2(50) 


Name of the 
electronic 

form ii«pH for 

protocol 
template. 


Parameter 








ParameterldilNTEGER 


Parameter 
identifier. 
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TABLE 


FIELD 


COMMENT 




ParameterTemplateId:INTEGER 


Reference to 
parameter 
template table. 




Value:VARCHAR2(20) 


Value of 
parameter. 




ProtocolID : INTEGER 


Reference to 
protocol table. 


ParameterTemplate 








ParameterTemplateld: INTEGER 


Parameter 

template 

identifier. 




Name : VARCHAR2( 100) 


Name of 

parameter 

template. 




ParamTemplateSead:INTEGER 


Reference to 
parameter 
template set 
table. 




StandardValue:VARCHAR2(100) 


Default value 
for parameter. 


ParamTemplateSet 








ParamTemplateSetld: INTEGER 


Parameter 
template set 
identifier. 
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TABLE 


FIELD 


COMMENT 




TypeId:INTEGER 


Renamed 
reference to 
parameter 
template set 
type table. 




Name : V ARCHAR2(20) 


Name of 
parameter 
template set. 


ParamTemplateSetType 








ParamTempSetTypeId:INTEGER 


Parameter 
template set 
type identifier. 




Description:VARCHAR2(50) 


Description of 
parameter 
template set 
type. 


ParameterTemplateSetMap 








ProtocolTemplateId:INTEGER 


Reference to 
protocol 
template table. 




ParamTemplateSetId:INTEGER 


Reference to 
parameter 
template set 
table. 


ProtocolTemplateDoc 
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TABLE 


FIELD 


COMMENT 




ProtocolDocId:INTEGER 


Protocol 
Template 
document 
identifier. 




ProtocolTemplateId:INTEGER 


Reference to 
protcol 

template table. 




Name:VARCHAR2(100) 


Name of 

protocol 

template. 




PathAndFUeName:VARCHAR2(50) 


File name for 
protocol 
template 
document. 




AuthorName :INTEGER 


Author of 
protocol 
template 
document. 




CreationDate:DATE 


Creation Date 

of protocol 
* 

template 
document. 


tbFragment 








FragmentId:INTEGER 


Fragment 
identifier. 




ChipSequence:LONG VARCHAR 


Sequence of 
fragment. 
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TABLE 


FIELD 


COMMENT 










CodeVARCHAR2(50> 


Code 

representing 
fragment. 


tblPrimerPair 








PrimerPairId:INTEGER 


Identifier for 
primer pair. 




LeftPrimerId:INTEGER 


Left primer 
identifier. 




RightPrimerId:INTEGER 


Right primer 
identifier. 




PCRSke-INTEGER 


length of 
amplified 
fragment 




Worked:SMALLINT 


Whether or 
not pair 
successfully 
amplified 
fragment. 




Fraermentld • TNTFGER 


Reference to 

fragment 

table. 


tblPCR 








ItemlId:INTEGER 


First part of 
reference to 
item derivation 
table. 
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TABLE 


FIELD 


COMMENT 




Item2Id : INTEGER 


Second part of 
reference to 
item derivation 
table. 




Reactionworked:SMALLINT 


Whether or 
not PCR 
reaction 
worked. 


PrimePairPCRMap 








PrimerPairId:INTEGER 


Reference to 
primer pair 
table. 




Item 1 ID : INTEGER 


First part of 
referenced 
item derivation 
table. 




Item2Id: INTEGER 


Second part of 
referenced 
item derivation 
table. 


tblPrimer 








PrimerId:INTEGER 


Primer 
identifier. 




Protocolld : INTEGER 


Reference to 
protocol table. 




01igoSeq:VARCHAR2(35) 


Sequence of 
primer. 
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Position: INTEGER 


Position of 
primer on 
fragment. 




LengthrlNTEGER 


Length of 
primer. 




MeltingTemp : INTEGER 


Melting 
temperature of 
primer. 




Direction: VARCHAR2(20) 


Direction 
(forward or 
reverse). 


tblPrimerOrder 








OrderId:INTEGER 


Order 
identifier. 




Employeeld: INTEGER 


Employee who 
made order. 




VendorId:INTEGER 


Vendor for 
order. 




OrderDate:DATE 


Date of order. 




Owner: VARCHAR2(50) 


Name of 
employee 
making order. 




veiiuur. v /vtvA^n/vtvt^ 


Name of 
vendor. 


tblVendor 
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TABLE 
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COMMENT 




Vendorld: INTEGER 


Vendor j 
identifier. j 




Vendor:VARCHAR2(50) 


Name of 
vendor. 




PhoneNumber:VARCHAR2(15) 


Phone number 
of vendor. 




FaxNumber : V ARCHAR2( 15) 


Fax Number 
of vendor. 1 




Address :VARCHAR2(50) 


Address of 
vendor. 




City:VARCHAR2(50) 


City of 
vendor. 




State : V ARCHAR2(50) 


State of 
vendor. 




Zip:VARCHAR2(50) 


Zip code of 
vendor. 


tblPrimerOrderDesignMap 








PrimerId:INTEGER 


Reference to 
primer table. 




Orderld : INTEGER 


Reference to 
order table. 


tblWafer 








WaferId:INTEGER 


Wafer 
identifier. 




Lotld: INTEGER 


Lot to which 
wafer belongs. 
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COMMENT 




Code:VARCHAR2(8) 


Code for 
wafer. 




SynthesisDate_delete:DATE 


Synthesis date 
for wafer. 




Released:DATE 


Date wafer 
available. 




Done • SMALUNT 


Whether wafer 
production is 
complete. 










ExpirationDate:DATE 


Expiration 
date of wafer. 




ExnectedLife ■ CH AR( 1 8) 


Exnected 
useful life of 
wafer. 


tblLot 








Lotld: INTEGER 


Lot identifier. 




WaferDesignldrlNTEGER 


Identifier for 
wafer design. 




LotNumber:VARCHAR2(12) 


Lot number. 




WaferPN : VARCHAR2(50) 


Part number 
for wafer. 


tblTiling Design 








TilingDesignID : INTEGER 


Tiling design 
identifier. 
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FIELD 


COMMENT 




ChinDesiffnSeauenceMaDlD * NUMB 
ER 


Reference to 
chip design 
sequence map. 




TilineFormatlD "INTEGER 


Reference to 
tiling format 
table. 




UnitNumber:INTEGER 


1 for sense, 0 
for antisense 




AtomOffset:INTEGER 


# to add to 
translate atom 
position in 
tiling to atom 
position in 
chip design 


tblTiling Format 








TiIingFormatID:INTEGER 


Tiling format 
identifier 




Orientation:CHAR(18) 


Orientation for 
tiling. 




ProbeLength:SMALLINT 


Length of 
probes. 




SubstitutionPosition: SMALLINT 


Substitution 
nosition for 
mutation base 
in probes. 


tblAtomDesign 
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FIELD 


COMMENT 




AtomDesignldiNUMBER 


Atom design 
identifier. 




TilingDesignID : INTEGER 


Reference to 
tiling design 
table. 




Position:INTEGER 


Position of 
atom in 
sequence. 


tblProbeDesign 








ProbeDesignID:NUMBER 


Probe design 
identifier. 




ChipDesignId:INTEGER 


Reference to 
chip design. 




x:SMALLINT 


x position of 
probe. 




y:SMALLINT 


y position of 
probe. 


tblProbeDesignRole 








ProbeDesignlDrNUMBER 


Reference to 
probe design 
table. 




AtomDesignlD : NUMBER 


Reference to 
atom design 
table. 
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Substitution:CHAR(18) 


Substitution 
position in 
probe design. 




Mismatches:NUMBER 


Whether probe 
is match or 
mismatch. 


tblProbeData 








ProbeDesignID:NUMBER 


Reference to 
probe design 
table. 




ScanExptID : INTEGER 


Reference to 
scan 

experiment 
table. 




Intensity:FLOAT 


Measured 
hybridization 
intensity for 
probe. 




NPixels;NUMBER 


Number of 
pixels used for 
intensity 
calculation. 




StDev:NUMBER 


Standard 
deviation for 
pixels. 


tblAnalysis 
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TABLE 


FIELD 


COMMENT 




Analysisld :INTEGER 


Analysis 
identifier. 




/vUoljrolo V ClMUUlJJ . ill 1 HvJUix 


jtvcicrencc to 
version of 
analysis. 




ProtocolID: INTEGER 


Reference to 
protocol table. 




DatePerformedrDATE 


Date analysis 
performed. 




1 1 CCUo U pUdLC . IN U IVlXJliJx 


w newer 
analysis is 
current. 


tblAnalysisDependency 








rareni/vnaiysisia. iiN ieajek 


Analysis 

providing 

input. 




O U UrXlldLLy o lolu . Ill 1 nXJEtSx 


/\Udiyold 

receiving 
input. 




Role:VARCHAR2(20) 


Role of data 
proviueu oy 
parent 
analysis. 


TblAnalysisInput 








AnalysisinputID : INTEGER 


Analysis input 
identifier. 



42 

SUBSTITUTE SHEET (RULE 26) 



WO 99/05324 




PCT/US98/15458 


TABLE 


FIELD 


COMMENT 




AnalysisId:INTEGER 


Analvsis 
receiving 
input. 




Inputtype:VARCHAR2(20) 


Type of input. 




ObjectID:INTEGER 


Reference to 
input data. 


tblChipDesignSequenceMa 
P 








ChioDesienSeauenceMaDlD ■ NUMB 
ER 


Chin design 
sequence map 
identifier. 




FragmentID: INTEGER 


Reference to 

fragment 

table. 




ChipDesignId:INTEGER 


Chip design 
identifier. 




AtomOffset: NUMBER 


# to add to 
translate atom 
position in 
tiling to atom 

millfc W I4VV1A1 

position in 
chip design 


tblSequencePosition 








SequencePositionlD : NUMBER 


Sequence position 
identifier. 




ChipDesignSequenceMapID : N UMBER 


Reference to chip 
design sequence 
map table. 



43 



SUBSTITUTE SHEET (RULE 26) 



WO 99/05324 



PCT/US98/15458 



5 



TABLE 


FIELD 


COMMENT 




PositionrNUMBER 


Position in 
fragment. 




GenomicSequencePositionID : INTEGER 


Reference to 
genomic sequence 
position table. 




RefBase:INTEGER 


Reference base. 


tblGenomicSequencePosition 








GenomicSequencePositionID:INTEGER 


Genomic 
sequence position 
identifier. 


tblScanExperimentSet 








ScanExperimentSetID:NUMBER 


Scan experiment 
set identifier. 


tbsScanExperimentUsed 








ScanExptID : INTEGER 


Reference to scan 
experiment table. 




ScanExperimentSetlDiNUMBER 


Reference to scan 
experiment set 
table. 


tblTilingData 








TilingDatalD: NUMBER 


Tiling data 
identifier. 




ScanExptID : INTEGER 


Reference to scan 
experiment table. 




TilingDesignID : INTEGER 


Reference to 
tiling design 
table. 


tblAtomData 
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TABLE 


FIELD 


COMMENT 




AtomDatalD : INTEGER 


Atom data 
identifier. 




TiiingDatalD : NUMBER 


Reference to 
tiling data table. 




SubjectSequencePositionID:INTEGER 


Reference to 
subject sequence 
position table. 


tblSubjectSequencePosition 








SubjectSequencePositionID INTEGER 


Subject sequence 

position 

identifier. 




SubjectID : INTEGER 


Reference to 
subject table. 




SequencePositionID : NUMBER 


Reference to 
sequence position 
table. 


tblPolymorphismAnalysis 








AnalysisId:INTEGER 


Reference to 
analysis table. 


tblPolyPositionResult 








Analysisld: INTEGER 


Reference to 
analysis table. 




PolyPositionID:INTEGER 


Polymorphism 

position 

identifier. 




ScanExperimentSetID:NUMBER 


Reference to scan 
experiment set 
table. 
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FIELD 


COMMENT 




Poly PositiontypelD : INTEGER 


Refers to 
possibility of 
polymorphism at 
position, e.g., 
certain, likely, 
possible, 
mismatch 

wrong). 




WTBase • CH AR ( 1 8^ 


AX/ilfl hmp at 
YYUU type u<U>C <u 

position. 




MuBase:INTEGER 


Mutation base at 
position. 


tblUserPolyanalysis 








AnalysisIdrlNTEGER 


Reference to 
analysis table. 


tblUserPolyanalysisResult 








AnalysisId:INTEGER j 


Reference to 

olioiyblo \aUiC 




SequencePositionID: NUMBER 


Reference to 
sequence position 




ScanExperimentSetID:NUMBER 


Reference to scan 
experiment set 
table. 




Poly PositionTypelD: INTEGER 


See 

polymorphism 
position result 
table. 
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